metabolis 0.1.4
To characterize the physiological variations of the human urine metabolome with age, body mass index (BMI), and gender, samples from a cohort of adult volunteers from the Saclay research institute (hence the “sac[lay]urine” name) have been analyzed by LC–HRMS (E. A. Thévenot et al. 2015).
In this study conducted by the MetaboHUB French Infrastructure for Metabolomics, urine samples from 184 volunteers were analyzed by reversed-phase (C18) ultrahigh performance liquid chromatography (UPLC) coupled to high-resolution mass spectrometry (LTQ-Orbitrap). Raw data are publicly available on the MetaboLights repository (MTBLS404).
This vignette describes the statistical analysis of the data set from the negative ionization mode (113 identified metabolites at MSI levels 1 or 2):
correction of signal drift (loess model built on QC pools) and batch effects (two batches)
variable filtering (QC coefficent of variation < 30%)
normalization by the sample osmolality
log10 transformation, sample filtering (Hotelling, decile and missing pvalues > 0.001) resulting in the HU_096 sample being discarded
univariate hypothesis testing of significant variations with age, BMI, or between genders (FDR < 0.05)
PCA exploration of the data set; ropls Bioconductor package (E. A. Thévenot et al. 2015)
OPLS(-DA) modeling of age, BMI and gender; ropls Bioconductor package (E. A. Thévenot et al. 2015)
selection of the features which significantly contributes to the discrimination between gender with PLS-DA, Random Forest, or Support Vector Machines classifiers; biosigner Bioconductor package (Rinaudo et al. 2016)
Methods from the metabolis, ropls, and biosigner packages used for the analysis of the sacurine dataset
A Galaxy version of this analysis is available W4M00001 ‘Sacurine-statistics’ on the Workflow4metabolomics.org online infrastructure (Guitton et al. 2017)
Loading the metabolis package
suppressMessages(library(metabolis))
The metRead function reads the data sets and builds the ExpressionSet object. For additional information about ExpressionSet class, see the “An introduction to Biobase and ExpressionSets” documentation from the Biobase package.
3 table format used as input
sacSet <- metabolis::metRead(system.file("extdata/sacurine",
package = "metabolis"))
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 113 features, 210 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: QC1_001 HU_neg_017 ... QC1_012_b2 (210 total)
## varLabels: subset full ... gender (10 total)
## varMetadata: labelDescription
## featureData
## featureNames: (2-methoxyethoxy)propanoic acid isomer
## (gamma)Glu-Leu/Ile ... Xanthosine (113 total)
## fvarLabels: database_identifier chemical_formula ... reliability
## (10 total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
sacSet <- metabolis::metView(sacSet)
##
##
## Data description:
##
## observations: 210
## variables: 113
## missing: 0
## 0 values (%): 0
## min: 506.5
## mean: 6700000
## median: 670000
## max: 6.8e+08
##
## Sample types:
##
## pool sample
## 26 184
sacSet <- metabolis::metCorrect(sacSet)
##
## Reference observations are: pool
##
## Processing batch:
## ne1
## ne2
features with QC coefficent of variation > 30%
sacSet <- metabolis::metView(sacSet)
##
##
## Data description:
##
## observations: 210
## variables: 113
## missing: 0
## 0 values (%): 0
## min: 268.1565
## mean: 6500000
## median: 710000
## max: 5.1e+08
##
## Sample types:
##
## pool sample
## 26 184
sacSet <- sacSet[Biobase::fData(sacSet)[, "pool_CV"] <= 0.3, ]
sacSet <- sacSet[, Biobase::pData(sacSet)[, "sampleType"] != "pool"]
print(sacSet)
## ExpressionSet (storageMode: lockedEnvironment)
## assayData: 110 features, 184 samples
## element names: exprs
## protocolData: none
## phenoData
## sampleNames: HU_neg_017 HU_neg_018 ... HU_neg_146_b2 (184 total)
## varLabels: subset full ... deci_pval (15 total)
## varMetadata: labelDescription
## featureData
## featureNames: (2-methoxyethoxy)propanoic acid isomer
## (gamma)Glu-Leu/Ile ... Xanthosine (110 total)
## fvarLabels: database_identifier chemical_formula ... pool_CV (11
## total)
## fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
## Annotation:
osmolality
Biobase::exprs(sacSet) <- sweep(Biobase::exprs(sacSet),
2,
Biobase::pData(sacSet)[, "osmolality"],
"/")
sacSet <- metabolis::metTransform(sacSet, "log10")
##
## Missing values in the 'dataMatrix': 0 (0%)
##
## Zero values in the 'dataMatrix': 0 (0%)
##
## 'log10' transformation
Hotelling, decile and missing p-values > 0.001
sacSet <- metabolis::metView(sacSet)
##
##
## Data description:
##
## observations: 184
## variables: 110
## missing: 0
## 0 values (%): 0
## min: 0.1213445
## mean: 3
## median: 3
## max: 5.9
##
## Sample types:
##
## sample
## 184
sacSet <- sacSet[, Biobase::pData(sacSet)[, "hotel_pval"] >= 0.001 &
Biobase::pData(sacSet)[, "miss_pval"] >= 0.001 &
Biobase::pData(sacSet)[, "deci_pval"] >= 0.001]
Final visual check of the data before performing the statistics
metabolis::metView(sacSet)
##
##
## Data description:
##
## observations: 182
## variables: 110
## missing: 0
## 0 values (%): 0
## min: 0.1391968
## mean: 3.1
## median: 3
## max: 5.9
##
## Sample types:
##
## sample
## 182
sacSet <- metabolis::metTest(sacSet,
factor = "gender",
testC = "limma",
adjustC = "BH")
##
## Performing 'limma'
##
## 41 variables (37%) were found significant at the 0.05 level (after the 'BH' correction).
## The first 15 are displayed below (sorted by increasing corrected p-values):
## gender_limma_Female.Male_dif
## Testosterone glucuronide 0.5815702
## p-Anisic acid -1.0003177
## Malic acid -0.2185078
## Pantothenic acid -0.2158701
## Acetylphenylalanine -0.2602737
## p-Hydroxyhippuric acid -0.1561200
## Citric acid -0.1078489
## alpha-N-Phenylacetyl-glutamine -0.1120281
## Oxoglutaric acid -0.2730331
## 4-Acetamidobutanoic acid isomer 3 -0.1762354
## N-Acetyl-aspartic acid -0.1038041
## Gluconic acid and/or isomers -0.1319615
## Monoethyl phthalate -0.3928070
## Glyceric acid -0.1312902
## (gamma)Glu-Leu/Ile 0.1276751
## gender_limma_Female.Male_BH
## Testosterone glucuronide 2.713402e-12
## p-Anisic acid 5.281909e-11
## Malic acid 7.686227e-09
## Pantothenic acid 1.021174e-07
## Acetylphenylalanine 5.422207e-06
## p-Hydroxyhippuric acid 1.694287e-05
## Citric acid 4.098478e-05
## alpha-N-Phenylacetyl-glutamine 8.992398e-05
## Oxoglutaric acid 1.313037e-04
## 4-Acetamidobutanoic acid isomer 3 1.952588e-04
## N-Acetyl-aspartic acid 1.952588e-04
## Gluconic acid and/or isomers 3.986868e-04
## Monoethyl phthalate 1.094504e-03
## Glyceric acid 1.546311e-03
## (gamma)Glu-Leu/Ile 1.758734e-03
ropls Bioconductor package
suppressMessages(library(ropls))
sacPca <- ropls::opls(sacSet)
## PCA
## 182 samples x 110 variables
## standard scaling of predictors
## R2X(cum) pre ort
## Total 0.526 9 0
plot(sacPca,
parAsColFcVn = Biobase::pData(sacSet)[, "gender"],
typeVc = "x-score")
## Warning: Character 'parAsColFcVn' set to a factor
plot(sacPca,
parAsColFcVn = Biobase::pData(sacSet)[, "age"],
typeVc = "x-score")
plot(sacPca,
parAsColFcVn = Biobase::pData(sacSet)[, "bmi"],
typeVc = "x-score")
sacSet <- ropls::getEset(sacPca)
sacSet <- metabolis::metHeatmap(sacSet, clustVi = c(5, 3))
ropls Bioconductor package
sacPlsda <- ropls::opls(sacSet, "gender")
## PLS-DA
## 182 samples x 110 variables and 1 response
## standard scaling of predictors and response(s)
## R2X(cum) R2Y(cum) Q2(cum) RMSEE pre ort pR2Y pQ2
## Total 0.278 0.731 0.562 0.261 3 0 0.05 0.05
sacSet <- ropls::getEset(sacPlsda)
biosigner Bioconductor package
suppressMessages(library(biosigner))
set.seed(123)
sacSign <- biosigner::biosign(sacSet, "gender")
## Warning: 'y' character vector converted to a factor with levels 'Female',
## 'Male'
## Significant features from 'S' groups:
## plsda randomforest svm
## Testosterone glucuronide "S" "S" "S"
## p-Anisic acid "S" "A" "S"
## Oxoglutaric acid "C" "S" "S"
## Pantothenic acid "A" "B" "S"
## Acetylphenylalanine "B" "B" "S"
## Ketoleucine "E" "E" "S"
## Malic acid "S" "E" "E"
## Taurine "E" "E" "S"
## Accuracy:
## plsda randomforest svm
## Full 0.871 0.861 0.891
## AS 0.867 0.869 0.911
## S 0.884 0.860 0.906
set.seed(NULL)
sacSet <- biosigner::getEset(sacSign)
metabolis::metWrite(sacSet, dirC = getwd())
Guitton, Yann, Marie Tremblay-Franco, Gildas Le Corguillé G., Jean-Francois Martin, Melanie Pétéra, Pierrick Roger-Mele, Alexis Delabrière, et al. 2017. “Create, Run, Share, Publish, and Reference Your LC-MS, FIA-MS, GC-MS, and NMR Data Analysis Workflows with the Workflow4Metabolomics 3.0 Galaxy Online Infrastructure for Metabolomics.” The International Journal of Biochemistry & Cell Biology 93: 89–101. doi:10.1016/j.biocel.2017.07.002.
Rinaudo, Philippe, Samia Boudah, Christophe Junot, and Etienne A Thévenot. 2016. “Biosigner: A New Method for the Discovery of Significant Molecular Signatures from Omics Data.” Frontiers in Molecular Biosciences 3: –. doi:10.3389/fmolb.2016.00026.
Thévenot, Etienne A., Aurélie Roux, Ying Xu, Eric Ezan, and Christophe Junot. 2015. “Analysis of the Human Adult Urinary Metabolome Variations with Age, Body Mass Index and Gender by Implementing a Comprehensive Workflow for Univariate and OPLS Statistical Analyses.” Journal of Proteome Research 14 (8): 3322–35. doi:10.1021/acs.jproteome.5b00354.